Automated Dynamic Analysis of CUDA Programs
نویسندگان
چکیده
Recent increases in the programmability and performance of GPUs have led to a surge of interest in utilizing them for general-purpose computations. Tools such as NVIDIA’s Cuda allow programmers to use a C-like language to code algorithms for execution on the GPU. Unfortunately, parallel programs are prone to subtle correctness and performance bugs, and Cuda tool support for solving these remains a work in progress. As a first step towards addressing these problems, we present an automated analysis technique for finding two specific classes of bugs in Cuda programs: race conditions, which impact program correctness, and shared memory bank conflicts, which impact program performance. Our technique automatically instruments a program in two ways: to keep track of the memory locations accessed by different threads, and to use this data to determine whether bugs exist in the program. The instrumented source code can be run directly in Cuda’s device emulation mode, and any potential errors discovered will be automatically reported to the user. This automated analysis can help programmers find and solve subtle bugs in programs that are too complex to analyze manually. Although these issues are explored in the context of Cuda programs, similar issues will arise in any sufficiently “manycore” architecture.
منابع مشابه
PUG : A Symbolic Verifier of GPU Programs
There is increasing interest in utilizing Graphical Processing Units for general-purpose computations. While substantial effort has been made on improving the programmability and performance of General Purpose GPU systems, little attention has been paid to verifying the correctness of the programs running on these systems. We present a preliminary automated symbolic verifier based on mechanical...
متن کاملGPUDrano: Detecting Uncoalesced Accesses in GPU Programs
Graphics Processing Units (GPUs) have become widespread and popular over the past decade. Fully utilizing the parallel compute and memory resources that GPUs present remains a significant challenge, however. In this paper, we describe GPUDrano: a scalable static analysis that detects uncoalesced global memory accesses in CUDA programs. Uncoalesced global memory accesses arise when a GPU program...
متن کاملPIPS Is not (just) Polyhedral Software Adding GPU Code Generation in PIPS
Parallel and heterogeneous computing are growing in audience thanks to the increased performance brought by ubiquitous manycores and GPUs. However, available programming models, like OPENCL or CUDA, are far from being straightforward to use. As a consequence, several automated or semi-automated approaches have been proposed to automatically generate hardware-level codes from high-level sequenti...
متن کاملIn-memory OLAP aggregation on GPUs using CUDA Dynamic Parallelism
Most queries involved with Online Analytical Processing (OLAP) depend on the functionality of aggregating data along the multidimensional hierarchies of an OLAP cube. In real-time OLAP, aggregated data for interactive operations e.g. roll-up and drill-down is computed on-the-fly. Fast response times are essential and can be accelerated significantly through data-parallel computation on graphics...
متن کاملUnsafe Floating-point to Unsigned Integer Casting Check for GPU Programs
Numerical programs usually include type-casting instructions which convert data among different types. Identifying unsafe type-casting is important for preventing undefined program behaviors which cause serious problems such as security vulnerabilities and result non-reproducibility. While many tools had been proposed for handling sequential programs, to our best knowledge, there isn’t a tool g...
متن کامل